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The  paper  discusses  a  large-scale  programming  system ,  the 
Qua$ i -Opt i mi zer  (00),  that  has  four  major  objectives: 

(i)  To  observe  and  measure  adversaries'  behavior  in  a 
competitive  environment,  to  infer  their  strategies  and  to 
construct  a  computer  model,  a  descriptive  theory,  of  each; 

(ii)  To  identify  strategy  components,  evaluate  their 
effectiveness  and  to  select  the  most  satisfactory  ones  from  a  set 
of  computed  descriptive  theories; 

<iii)  To  combine  these  components  in  a  quas i -opt i mum 
strategy  that  represents  a  normat i ve  theory  in  the  statistical 
sense; 

(iv)  To  provide  information  as  to  in  which  regions  a  given 
strategy  is  most  proficient,  to  a  meta-strategy.  It  will  then 
shift  the  domain  of  confrontations  between  the  strategy  and  its 
adversaries  to  the  regions  specified  and,  thereby,  increase  the 
effective  quality  of  the  strategy. 

The  first  of  six  fairly  independent  modules  of  the  QJ) 
system,  QO-1 ,  constructs  a  descriptive  theory  of  static 
strategies  given  as  black-box  programs  impenetrable  by  QO-1  .  It 
also  identifies  which  of  all  possible  decision  variables  are 
relevant  for  the  strategy  being  modelled.  The  program  can  use 
either  an  exhaustive  search  pattern  or  a  binary  chopping 
technique  in  the  space  of  decision  variables  while  carrying  out  a 
sequence  of  controlled  experiments  on  the  strategy.  As  an 
inductive  discovery  feature,  it  can  also  correlate  certain 
stochastic  consequences  of  the  strategy  with  subranges  of  values 
of  each  decision  variable.  The  strategy  response  surface  is 
assumed  by  QO-1  to  be  weakly  monotonic. 

The  second  module,  00-2,  freezes  the  learning  components  of 
an  evolving  strategy,  invokes  Q0-1 ,  and  makes  a  snapshot,  a 
computer  model  of  the  strategy  at  successive  stages  of 
development.  It  then  computes  the  asymptotic  form  of  the  sequence 
of  snapshots,  which  will  then  be  taken  as  a  contributory  strategy 
for  the  normative  strategy  to  he  computed. 

The  third  module,  Q0-.1 ,  minimizes  the  total  number  of 
experiments  performed  in  constructing  the  descriptive  theory  of  a 
strategy.  It  no  longer  assumes  that  the  strategy  response  surface 
is  monotonic  and  also  deals  with  mu  It i -di mens i ona l  responses. 
Q0-3  starts  with  a  balanced  incomplete  block  design  for 
experiments  and  computes  dynamically  the  specifications  for  each 
subsequent  experiment.  In  other  words,  the  levels  of  the  decision 
variables  in  any  single  experiment  and  the  length  of  the  sequence 
of  experiments  depend  on  the  responses  obtained  in  previous 
exper i ment  s . 

The  fourth  module,  QO-A ,  performs  the  credit  assignment.  It 
identifies  the  components  of  a  strategy  and  assigns  to  each  a 
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quality  measure  of  'outcomes'.  An  outcome  need  not  be  only  the 
immediate  result  of  a  sequence  of  actions  prescribed  by  the 
strategy  but  can  also  involve  long-range  consequences  of  planned 
actions.  An  important  extension  of  this  subproject  enables  a 
meta-st  rategy  to  channel  the  domain  of  confrontation  to  S4jch 
regions  in  which  a  given  strategy  is  most  proficient. 

The  fifth  module,  QO-5,  constructs  a  'Super  Strategy'  hy 
combining  the  best  strategy  components  of  all  input  strategies, 
the  descriptive  theories,  applicable  for  every  region  in  the 
total  domain. 

Finally,  the  sixth  nodule,  QO-6,  generates  a  Qua s i -Opt i mum 
strategy  from  the  Super  Strategy  by  eliminating  inconsistencies 
and  redundancies  from  the  latter.  We  refer  to  the  result  as 
quas i -opt i mum  rather  than  a  normative  theory  for  four  reasons. 
First,  the  resulting  strategy  is  optimum  only  against  the 
original  set  of  strategies  considered.  Another  set  may  well 
employ  controllers  and  indicators  for  decision-making  that  are 
superior  to  any  in  the  “training"  set.  Second,  the  strategy  is 
normative  only  in  the  statistical  sense.  Fluctuations  in  the 
adversary  strategy,  whether  accidental  or  deliberate,  impair  the 
performance  of  the  00  strategy.  Third,  the  adversary  strategy  may 
change  over  time  and  some  aspects  of  its  dynamic  behavior  may 
necessitate  a  change  in  the  QO  strategy.  Finally,  the  generation 
of  both  the  descriptive  theories  (models)  and  of  the  normative 
theory  (the  QO  theory)  is  based  on  approximate  and  fallible 
measurements  . 
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